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Abstract 



Shannon's Entropy Power Inequality can be viewed as characterizing the minimum differen- 
tial entropy achievable by the sum of two independent random variables with fixed differential 
entropies. The entropy power inequality has played a key role in resolving a number of problems 
in information theory. It is therefore interesting to examine the existence of a similar inequality 

£-H for discrete random variables. In this paper we obtain an entropy power inequality for random 

variables taking values in an abelian group of order 2™, i.e. for such a group G we explicitly 
& characterize the function fa{x,y) giving the minimum entropy of the sum of two independent 

i ^ i G-valued random variables with respective entropies x and y. Random variables achieving the 

extremum in this inequality are thus the analogs of Gaussians in this case, and these are also 
determined. It turns out that Jg(x, y) is convex in x for fixed y and, by symmetry, convex in y 
for fixed x. This is a generalization to abelian groups of order 2™ of the result known as Mrs. 

ly-j Gerber's Lemma. 

m 
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(N 1 Introduction 



J> The Entropy Power Inequality (EPI) relates to the so called "entropy power" of M"- valued random 

variables having densities with well defined differential entropies. It was first proposed by Shan- 
non in 1948 [1], who also gave sufficient conditions for equality to hold. The entropy power of an 
R n -valued random variable X is defined as the per-coordinate variance of a circularly symmetric 
M n -valued Gaussian random variable with the same differential entropy as X. 



Theorem 1.1 (Entropy Power Inequality). For an W l -valued random variable X, the entropy 
power o/X is defined to be 

NVQ = ^- MX) , (1) 

where /i(X) stands for the differential entropy of X . Now let X and Y be independent M n -valued 
random variables. The EPI states that entropy power is a super- additive function, that is 

iV(X) + N(Y) < iV(X + Y), (2) 

with equality if and only if X and Y are Gaussian with proportional covariance matrices. 
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Shannon used a variational argument to show that X and Y being Gaussian with proportional 
covariance matrices and having the required entropies is a stationary point for /i(X + Y), but this 
did not exclude the possibility of it being a local minimum or a saddle point. The first rigorous 
proof of @ was given by Stam [2] in 1959 based on an identity communicated to him by N. G. 
De Bruijn, which couples Fisher information with differential entropy. Stam's proof was further 
simplified by Blachman [3J. Lieb [1] gave a proof of the EPI using a strengthened Young's inequality. 
More recently, Verdu and Guo [5] gave a proof without invoking Fisher information, by using the 
relationship between mutual information and minimum mean square error (MMSE) for Gaussian 
channels. Rioul [6] managed to give a proof sidestepping Fisher information as well as MMSE 
estimates. 

The EPI has a played a key role in the solution of a number of communication problems. It 
is generally used to prove converses of coding theorems when Fano's inequality is insufficient to 
prove optimality. Some famous examples consist of Bergmans's solution to the Gaussian broadcast 
channel problem [7J, Leung- Yan-Cheong and Hellman's determination of the secrecy capacity of 
a Gaussian wire-tap channel [8], Ozarow's solution to the scalar Gaussian source two-description 
problem Oohama's solution to the quadratic Gaussian CEO problem [10] . and recently Wein- 
garten, Steinberg and Shamai's solution to the multiple-input multiple-output Gaussian broadcast 
channel problem jllj . 

The EPI has been generalized in a number of ways. Costa [12] strengthened the inequality when 
one of random variables was Gaussian. In particular, Costa showed that if independent Gaussian 
noise is added to an arbitrary multivariate random variable, the entropy power of the resulting 
random variable is concave in the variance of the added noise. Dembo [13] reduced Costa's inequal- 
ity to an equivalent inequality in terms of Fisher information and proved this inequality. Vilani 
[14j further simplified Dembo's proof. Zamir and Feder [15] generalized the scalar EPI using linear 
transformations of random variables. T. Liu and Viswanath [16] obtained a generalization of the 
EPI by considering a covariance-constrained optimization problem, motivated by the problems of 
the capacity region of the vector Gaussian broadcast channel and of distributed source coding with 
a single quadratic distortion constraint. R. Liu, T. Liu, Poor and Shamai [I7J gave a vector gener- 
alization of Costa's EPI. The EPI for general independent random variables and the corresponding 
Fisher information inequalities have also been used to prove strong versions of the central limit 
theorm, with convergence in relative entropy. Artstein, Ball, Barthe, and Naor [18] showed that 
the non-Gaussianness (divergence with respect to a Gaussian random variable with identical first 
and second moments) of the sum of independent and identically distributed random variables is 
monotonically non-increasing.. Simplified proofs of this result were later given Tulino and Verdu 
|19j and by Madiman and Barron |20| . 

There have also been several attempts to obtain discrete versions of the EPI. For the binary 
symmetric channel (BSC), Wyner and Ziv [21], [22] proved a result called Mrs. Gerber's Lemma 
(MGL), see Theorem |1.2| below, which was extended to arbitrary binary input-output channels by 
Witsenhausen [23] , Shamai and Wyner [24] used MGL to give a binary analog of the EPI. Harremoes 
and Vignat [25] proved a version of the EPI for binomial random variables with parameter |. 
Sharma, Das and Muthukrishnan [26] expanded the class of binomial random variables for which 
Harremoes's EPI holds. Johnson and Yu [27] gave a version of the EPI for discrete random variables 
using the notion of Renyi thinning. 

In this paper we take a different approach towards getting a discrete analog of the EPI. Notice 
that even though the EPI is interpreted as an inequality in terms of the "entropy power" of random 
variables, it is essentially a sharp lower bound on the differential entropy of a sum of independent 
random variables in terms of their individual differential entropies. If we are dealing with discrete 
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random variables, as long the "sum" operation is defined we can arrive at an analogous lower bound, 
except with entropies instead of differential entropies. A natural case to consider is when the random 
variables take values an abelian group G and to define the function fc : [0, log |G|] x [0, log \G\] — > 
i). log (J hy 

f G (x,y)= min H(X + Y). (3) 

J V y ' H(X)=x,H(Y)=y ^ 1 V ' 

We can then exploit the group structure and try to arrive at the explicit form of fa- 

A closely related function has been studied by Tao [28] in which the sumset theory of Plunnecke 
and Ruzsa has been reinterpreted using entropy as a proxy for the cardinality of a set. The 
sumset and inverse sumset inequalities in [28] were further proved for differential entropy in [30J. 

Let us now consider two special cases: G = Z2 and G = R. In the first case we note that on Z2, 
there is a unique distribution (up to rotation) corresponding to a fixed value of entropy. We can use 
this to simplify fa 2 by writing it in terms of the inverse of binary entropy, : [0, log 2] — > [0, ^] 

f Z2 (x,y) = h(h- 1 (x)*h- 1 (y)). (4) 

This is precisely the function for which Wyner and Ziv's MGL is applicable, in fact we can restate 
MGL in terms of fa 2 ' 

Theorem 1.2 (Mrs. Gerber's Lemma). fa 2 (x,y) is convex in y for a fixed x, and by symmetry 
convex in x for a fixed y. 

For the case of G = M it is worthwhile to note that the function fa, which can be written 
explicitly as 

/ R (*,y) = ^log( e 2 * + e 2 *0 (5) 

satisfies the convexity property described by MGL. In fact fa is jointly convex in (x, y). We can how- 
ever easily check that fa 2 is not jointly convex in (x, y) since fa 2 (x, x) > x = fa 2 (log 2, log 2) + 

§5)^(0,0). 
It seems natural to make the following conjecture: 
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Conjecture 1 (Generalized MGL). If G is a finite abelian group, then fc(x,y) is convex in x for 
a fixed y, and by symmetry convex in y for a fixed x. 

Witsenhausen |23] and Ahlswede and Korner [31] attempted to generalize MGL by defining g(x) 
to be the minimum output entropy of a channel subject to a fixed input entropy x. They showed 
that g{x) is convex for all binary input - binary output channels, but that counterexamples to this 
convexity exist for other channels. They resolve this issue by providing a version of MGL based on 
the convex envelope of g{x). Our function fc{x-, y) can be thought of as related to the g function 
in this line of work, but it differs in the key aspect that the 'channel' is not fixed. To connect to 
this line of work, we can think of the capacity of the channel as being fixed (subject to it being an 
additive noise channel). We are then looking at the worst possible (in terms of minimum mutual 
information I{X + Y; X)) input and channel distributions, while fixing the input entropy and the 
channel capacity. 

We have carried out simulations to test Conjecture [T] for Z3 and Z5 and it appears to hold for 
these groups. In this paper we prove Conjecture [T] for all abelian groups G of order 2 n . In fact 
we arrive at an explicit description of fa in terms of fa 2 for such groups. We also characterize 
those distributions where the minimum entropy is attained - these distributions are in this sense 
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analogous to Gaussians in the real case. Our results support the intuition that to minimize the 
entropy of the sum, the random variables X and Y should be supported on the smallest possible 
subgroup of G (or cosets of the same) which can support them while satisfying the constraints 
H{X) = x and H(Y) = y. 

The structure of the document is as follows. In section 2 we consider the function /g 2 and 
derive certain lemmas regarding the behaviour of fz 2 along lines passing through the origin. In 
section 3, we use the preceding lemmas to explicitly compute fz 4 . This can be thought of as the 
induction step toward evaluating f^ 2 „ . In section 4 we use induction and determine the form of 
fz 2 n- In section 5 we show that if G is abelian and of order 2™, then fa = fz 2 n- Since fa is 
explicitly determined for all abelian groups of order 2 n we have in effect proved an EPI for such 
groups. Further, the Jq we find verifies Conjecture [T] and so proves MGL for all abelian groups of 
order 2™. In section 6 we provide some generalizations of our result that are likely to be of interest. 
Notably, we study the minimum entropy of a sum of k > 2 independent G- valued random variables 
of fixed entropies for G of order 2 n , and give an iterative expression to compute this minimum in 
terms of fa- 



2 Preliminary Inequalities 

In this section we prove a few key lemmas which are needed to prove our EPI and MGL for Z4, then 
for Z2", and finally for abelian groups G of order 2 n . Consider / : [0,log2] x [0,log2] — > [0,log2] 
given by 

f(x,y) = h(h- 1 (x)*h- 1 (y)) . 

Of course / = fz 2 , where f% 2 is defined in equation ([3]), but it is convenient to drop the subscript 
in this section. 

For our first lemma, we consider lines of slope < 8 < 00 passing through the origin. The 
result we wish to prove is: 

Lemma 2.1. ^ strictly decreases along lines through the origin having slope 9, where < 9 < 00. 

Remark 1. When 9 = 0, ^ is constant and is equal to 1 and when 9 = +00, -g^ is constant and 
equal to 0. The above lemma claims that for all other values 9 £ (0, 00), 9x) strictly decreases 

in x. 

Proof. For the proof, refer to Appendix [Aj □ 

Lemma 2.2. f(x,y) is concave along lines through the origin. More precisely, f(x,y) is concave 
along the line y = 9x when < 9 < 00, and strictly concave along this line for < 9 < 00. 



Proof of Lemma 2.2, When 9 = or 00, f(x,y) is linear along the line y = 9x, thus concave. For 
< 9 < 00, by Lemma 2.1, we have that ^ strictly decreases along lines through the origin. By 
symmetry, it follows that ^ also strictly decreases along lines through the origin. Since 

*§«-g<M.> + #g<M.>. (6) 

it is immediate that d ^^f x ^ also strictly decreases in x, which means that f(x,y) is strictly concave 
along the line y = 6x. □ 
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(asi.J/i) 



(9£ d£ 
\ 9x ' 9j/ 



i/ien 



(2:2,2/2) 



Lemma 2.3. // (x h y x ), (x 2 , y 2 ) e (0,log2) x (0,log2) and(g,g) 

Remark 2. The above lemma says that in the interior of the unit square, the pair of partial 
derivatives at a point uniquely determine the point. That this fails on the boundary is seen from 
the fact that for any point of the form (x, 0) the pair of partial derivatives evaluates to (1, 0) and 
for every point of the form (0, y) it is (0, 1). 



Proof of Lemma \2.3[ Without loss of generality, assume x\ < x 2 - We consider two cases: y± > y 2 
or yi < y 2 . 

Suppose y\ > y 2 , in this case we have 



df 
dx 



< 



(zi,2/l) 



df 

dx 



< 



torn) 



dx 



(7) 



The first inequality follows from Mrs. Gerber's Lemma. To see why the second inequality is true, 
note that 

df df dp 
dx dp dx 

1 — p * q\ I 



=(l-2g)log 



p-k q 



log 



(8) 
(9) 



where x = h(p) and y = h(q) with < p,q < \. Thus, for a fixed p, as q increases strictly 
decreases, i.e. for fixed x, as y increases strictly decreases. Note also that at least one of the 
two inequalities is strict as (xi,yi) 7^ (x 2 ,y 2 ). Thus 



df 
dx 



< 



dl 

dx 



(10) 



It remains to consider the case y\ < y 2 . We can also assume x\ < x 2 , since x\ = x 2 combined with 
Vl < Wi gives 



df 

dx 



> 



dj 

dx 



(X2,V2) 



The only remaining case is thus (xi,yi) < (x 2 ,y 2 ). Consider the line passing through the origin 
and (xi,yi). We again break this up into two cases: either y 2 > x 2 ^~ or y 2 < x 2 ^-. 



dl 
dx 



> 



(si ,2/1) 



dl 
dx 



df 



> 

vi\ dx 



(x2,y2) 



(11) 



where the first inequality follows from Lemma 
fixed x and an increasing y. 
\iy 2 < x 2 — , we have 



2.1 



dl 
dy 



> 



df 



and the second follows from 2^ decreasing for a 
df 



{xx, 2/1) 9y 

where the first inequality follows from Lemma 



> 

11 „.., dy 



(2/2^,2/2) 



(12) 



(3:2,2/2) 



2.1 



and the fact that 2/2^ > %\- The second 



inequality follows from the symmetric analogue of decreasing for a fixed x and an increasing y, 

9tj 



which is that ^ decreases for a fixed y and an increasing x.. This completes the proof of Lemma 



□ 
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3 An EPI and MGL for Z4-valued random variables 



Analogous to the framework for Shannon's EPI in the case of continuous random variables, we 
consider two independent random variables X and Y taking values in the cyclic group Z4 and seek 
to determine the minimum possible entropy of the random variable X + Y, where + stands for the 
group addition, and we a priori fix the entropy of X and that of Y. 
Formally, we define f± : [0,log4] x [0,log4] — > [0,log4] by 



U(x,y)= min H(X + Y) 

MK > U) H{ x)= x ,H(Y)=y V ' 



(13) 



Thus fi = fz 4 , where fz 4 is defined in equation Q. In this section we will also use the notation 
h for fz 2 , so we have f 2 : [0,log2] x [0,log2] -)• [0,log2] given by 



f 2 (x,y) = min H(X + Y) . 

H(X)=x,H(Y)=y 



(14) 



We will prove: 
Theorem 3.1. 



The following corollary is immediate from Theorem |3.1| and Mrs. Gerber's Lemma. 
Corollary 3.1. f^x, y) is convex in x for a fixed y, and by symmetry convex in y for a fixed x. 



x, if log 2 < x < log 4, < y < log 2 

y, ifO<x< log 2, log 2 < y < log 4 . 

h{x,y), ifO < x,y < log 2 , 
f 2 {x - log2,y - log 2) + log2, if log2 < x,y < log4 . 



Proof of Theorem \3.1\ We deal with the initial two cases first. Without loss of generality, assume 
log 2 < x < log 4, < y < log 2. Note that we have the trivial lower bound 



fi{x,y) > x 



(15) 



obtained from H(X + Y) > H(X). Thus if we can find distributions for X and Y such that this 
lower bound is achieved, then it implies fi(x,y) = x. This is exactly what we do. Since y < log 2, 
let j3 = h~ 1 (y) and consider the distribution of Y 

py := (0,0,1-/8,0) . 

Also, as log 2 < x, we can find a such that log 2 + H(2a, 1 — 2a) = x. Using this a, define 

px '■= (a, 1 — a, a, 1 — a) . 

The distribution of X + Y is given by the cyclic convolution px ©4 Py , which in this case is px 
again. Thus H(X + Y) = H(X), and f 4 (x, y) = x. 



Before starting on the other two cases, we derive some preliminary inequalities. We'll think of 
distributions on Z4 as a combination of distributions supported on {0, 2} and {1, 3}. For a random 
variable X, we write its distribution px as 

px = a(p ,0,p 2 ,0) + (1 - a)(0,pi,0,p 3 ) = (ap , (1 - a)pi,ap 2 , (1 - a)p 3 ) , 



6 



3 AN EPI AND MGL FOR Z 4 -VALUED RANDOM VARIABLES 



where 1 > Po,Pi,P2,P3, 01 > and 

Po + P2 = 1 , 
Pi + P3 = 1 • 

Similary we write 

p Y = /3(gG, 0, 92, 0) + (1 - /3)(0, ft, 0, 93) = (1 - /3)9i,#s, (1 - /3)ga) , 
where 1 > 90, 91, 92, 93, /? > and 

90 + 92 = 1 , 

91 + 93 = 1 ■ 

Let X + Y = Z. The distribution of Z is given by 

PZ=PX®iPY (16) 
a( Po , 0,p2, 0) + (1 - a)(0,pi, 0,p 3 )^ ©4 ( 0(qo, 0, 92, 0) + (1 - /3)(0, 91, 0, q 3 )) (17) 



a^(po, 0,p2, 0) ©4 (90, 0, 92, 0) + (1 - a)(l - P)(0, P i, 0,p 3 ) ©4 (0, 91, 0, 93) ) ( 1 8) 

+ ^a(l - /3)(p , 0,p 2 , 0) ©4 (0, 91, 0, 93) + (1 - a)(3(0, Pl ,0,p 3 ) ©4 (90, 0, 92, 0) 
Thus 

H(p z ) = h(a*(3) (19) 

+ (1 - a *p)H ( a/3 J p ,p 2 ) ©2 (90,92) + ^ ~ ^ ~/\ pi,Pz) ©2 (91,93) ) 
\1 — a-kp I — a-kp J 

+ {a-k p)H ( ( ^—p-(po,P2) ©2 (91, 93) + — °^- {pi,P3) ©2 (90, 92) ) 
\ a-k p a-kp J 

> h{a*P) + a^H^(p ,p 2 ) ©2 (90,92)) + (1 - a)(l - P)H^( Pl ,p 3 ) © 2 (91,93)) (20) 
+ a(l - ( (po, P2) ©2 (91 , 93) ) + (1 - ( (pi , p 3 ) ©2 (90, 92) 



/ 2 (fc(a), /i(/3)J + a/3/2 ^(po,P2), #(<Zo, 92)J + (1 - a)(l - /3)/ 2 (#(pi,P3), # (91, 9s)J 

(21) 

+ a(l - /3)/ 2 ( H(p ,p 2 ), H(q u 93) ) + (1 - a)0/ 2 ( H( Pl ,p 3 ), H(q , q 2 ) ' 



>/ 2 ^(«),M/3)J + a/2 (ff(P0,P2), ^(90, 92) + (1-/3)^(91, 93)) (22) 
+ (1 - a)/ 2 (H( Pl ,p 3 ),pH(q , q 2 ) + (1 - (3)H( qi ,q 3 ) 

> h{h{a),h{P)) +h(aH(p^p 2 ) + (1 - a)H( Pl ,p 3 ), /3H(q ,q 2 ) + (1 - f3)H( qi ,q 3 )) 

(23) 
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h h(a), h(fi) + hi H(X) - h(a), H(Y) - h(/3) 



(24) 



In this sequence of inequalities, (19) is a simple expansion of entropy, (20) is got via concavity of 



entropy, (21) is simply a restatement in terms of f 2 , (22) and (23) are obtained using convexity in 



Mrs. Gerber's Lemma, and the last equality follows from the chain rule of entropy. 



Coming back to the remaining two cases of Theorem 3.1, we can write down the following 
inequalities as consequences of the above inequalities: 
For < x, y < log 2 

U(x,y) > mm f 2 {u,v) + f 2 (x -u,y - v) , (25) 



where < u < x and < v < y. 
For log 2 < x, y < log 4, 

U(x,y) > mm f 2 {u,v) + f 2 (x -u,y - v) , 

u,v 

where x — log 2 < u < 1 and y — log 2 < v < 1. 



(26) 



Consider the third case, < x, y < log 2. We'll show that the minimum in ( |25| ) is when u, v are 
both equal to (or by symmetry u = x, v = y) and the value of the minimum is f 2 (x, y). 

We'll first prove a small claim. 



Claim 3.1. 

[0,log2]. 



dx 



< 1, with strict inequality if (x,y) lies in the interior of the square [0, log2] x 



Remark 3. By symmetry, 



8y 



< 1, with strict inequality in the interior. 



Proof of Claim 3.1 We note that when y = 0, f 2 (x,0) = x which gives 



Oh 



dx 



1. Now fix 



y > 0. By Mrs. Gerber's Lemma, we know that f 2 (x,y) is convex is x for a fixed y. This means 
that increases with x and is maximum when x = 1. Writing x = h(p) and y = h(q) with 
< p, q < 5, we have f 2 (x, y) = h(p * g), and 



f = (l-2,)log(l^ 
ox \ p-kq 



(27) 



Taking the limit as x — > 1 is the same as taking the limit asp-)- ^. Using L'Hopital's rule, we get 

1 — p-k q\ 1 



lim (1 — 2q) log 



p-kq 



log I 1 ? 



lim(l-2 g ) 2 / P {)- P) r 



(28) 



This is easily seen to be (1 — 2g) 2 which has magnitude < 1 for q ^ 0. This establishes the claim. □ 
Now for < x, y < log 2, consider the function g : [0, x] x [0, y] — > M given by 

:= f 2 (u,v) + f 2 (x -u,y-v) . 
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As per (25), we want to minimise g over its domain. We can think of the domain as a rectangle 
with corner points (0, 0) and (x, y) in M 2 . Suppose the minimum is achieved strictly in the interior 
of this rectangle, at a point say (u*,v*), then we must have 



(29) 
(30) 



which implies 





dg 




= o , 




du 


(u* ,v*) 




dg 




= , 




dv 


(u* ,v*) 


dh 




= dh 




du 


(u* ,v*) 


du 


(x— u* ,y— v*) 


dh 




= dh 




dv 


(u* ,v*) 


dv 


(x—u*,y—v*) 



(31) 
(32) 



By Lemma 2.3, we infer that 



(u*,v*) = (x-u*,y-v*) . (33) 

Thus (u*, v*) = (| , |) . Now let 9 = |, and consider the function g over the line with slope 9 passing 
through the origin. By Lemma 2.2 , we know that h(t, 9t) is concave, and thus so is h(x — t,y — 9t) 



and so is their addition g(t,9t). Thus, the minimum value of g(t,9t) must be attained at the 
extreme points and not in the interior. Note that since (u*,v*) lies on this line, it cannot be the 
global minimum of g on its domain. This leads us to conclude that the global minimum of g is not 
attained anywhere in the interior of the rectangle and therefore must be attained on the boundary. 



Now consider a point (uq,0) along the boundary. Taking the partial derivative with respect to 



u, 



dg 




dh 




dh 




du 


Oo,0) 


du 


(«o,0) 


du 


(x-u ,y) 



> 



where the inequality follows from 



("o,o) 



land|| 



(x-u ,y) 



< 1 by Claim 



a boundary point of the form (0, vq) we can see that — 



(it: 



3.1 



(34) 



Similarly, for 



> 0. Hence, we conclude that the 



(0,-uo) 



minimum value on the boundary is attained when u = 0, v = and the value is h(x,y). Thus 
inequality ( 25 ) reduces to 

U{x,y) > h{x,y) ■ (35) 

Clearly, h(x,y) is achieved if the random variables are supported on the {0, 2}, and therefore we 
get 

h(x,y) = h(x,y) for < x,y < log 2 . (36) 
This completes the proof for the third case. 



Moving on to the last case, define u = u — (x — log 2) and v = v — (y — log 2). Rewriting (26), 
h(x,y) >min/ 2 (log2-n,log2- 1 }) + / 2 (n + (x-log2),S + (y-log2)) , (37) 



9 



3 AN EPI AND MGL FOR Z 4 -VALUED RANDOM VARIABLES 



where 

0<u<21og2-x, 
0<6<21og2-y. 

Just as in the previous case, define 

g(u, v) := / 2 (log 2 - u, log2 - v) + fc(u + (x - log2), v + (y - log 2)) . 

The domain of ( - u,'5) can be thought of as a rectangle in M 2 with corner points (0,0), (2 log 2 — 
x, 2 log 2 — y). Suppose the minimum value is attained at (u*,v*) lying in the interior of this 
rectangular domain. In such a case we must have 

dg 
du 



dg_ 

dv 



which implies 



dh 
du 

dh 
dv 



(log2-M*,log2-{i*) 



(u*,v*) 
(u*,ii*) 

du 



, 
, 



(38) 
(39) 



dh 
dv 



(fi* + (x-log 2),t>*+(j/-log 2)) 
(u* + (:z-log2),ii*+(y-log2)) 



(log 2— it*, log 2—ii*) 

By Lemma 2.3, we infer that 

(log 2 - u*, log 2 - 5*) = (u* + (a; - log 2), u* + (y - log 2)) 



(40) 
(41) 

(42) 



which implies (u*,u*) = (log 2 — |,log2 — |). Now if (u*,v*) were indeed the global minimum, 
then we must have for every 9 



dt 2 



g(u* + t, v* + 9t) 



> . 



t=o 



Note that 



g{u* + t,v* + ot) = h{--t, y --et)+h{- + t, y - + et 



Now choose 9 = ^. By Lemma 

d 2 



2.2 



2 

we have 



(it 2 



5(n* + t, v* + 0i) 



t=0 



^ 2 , 



^ y 
— t,--9t 

2 ' 2 



< 



(43) 
(44) 

(45) 



t=o 



This means that (u*,S*) cannot be the global minimum, and the global minimum therefore must 
lie on the boundary. For a boundary point of the form (uq,0) 



dg_ 
du 



(«o,o) 



dh 
du 



+ 



(Iog2-So.log 2) 



dh 
du 



>0 , 



(u +a;-log 2,2/— log 2) 



where the inequality follows from ^ 

(Iog2-Mo,log2) 

a boundary point of the form (0, v$) we have 



and H 



(46) 



> 0. Similarly for 



(uo+^-log 2,y— log 2) 



(0,«o) 



d/2 
<9f) 



+ 



(Iog2,log2-i)o) 



dh 
dv 



> . 



(47) 



(a;— log 2,So+y-log 2) 
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In both cases we see that the minimum is attained when (u, v) = (0, 0) and the value of the 
minimum is 

/ 2 (log 2, log 2) + f 2 (x - log 2, y - log 2) = log 2 + f 2 {x - log 2, y - log 2) . 

Thus inequality p6| ) reduces to 

U(x,y) >log2 + / 2 (x-log2,y-log2) . (48) 

Since log 2 < x,y < log 4, we can find distributions = (f , f j anc ^ = (f ' ^2 > f > ^2 ) 
such that H(px) = x and H(py) = y, with 

x = /i(a) + log 2 , (49) 
y = /i(/3) + log2 , (50) 

fl'(x + y) = J ff(p Jf ®4py) = fr( — 2-^.^' — 2 (51) 

= h(a* 3) + log2 

= f 2 (h(a), h(B)) + log 2 

= / 2 (x-log2,y-log2)+log2 . (52) 



Thus the bound in (48) is achieved, and we conclude that 

U{x,y) = log2 + / 2 (x-log2,y-log2) . (53) 
This completes the proof of Theorem |3.1| □ 



Proof of Corollary \3.1\ Consider the function f x (y) = f(x, y). We look at two cases, < x < log 2 
and log 2 < x < log 4. In the first case, 



h(x,y), if0<y<log2, 
y if log 2 < y < log 4 



fx(y) 

Now / 2 (x, y) for a fixed x and < y < log 2 is convex by MGL, and for values of y beyond log 2 
the function f x is linear with slope 1. By Claim 3.1 attaching this linear part to a convex function 
will not affect the convexity since the slope of the linear part (= 1) is greater than or equal to the 
derivative of the convex part. Similarly for the second case, 



fx{y) 



x, if < y < log 2 , 

f 2 (x - log2,y - log2) +log2 if log2 < y < log4 



This too, has a linear part with slope attached before a convex part with slope greater equal 
everywhere, thus the overall function continues being convex. □ 



4 An EPI and MGL for valued random variables 

Analogous to the Z4 case, we consider two independent random variables X and Y taking values in 
the cyclic group Z 2 n and seek to determine the minimum possible entropy of the random variable 
X + Y where + stands for the group addition, where we a priori fix the value of the H{X) and the 
value ofH(Y). 
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Formally, we define f 2 n ■ [0, re log 2] x [0, nlog2] — > [0, re log 2] by 



f 2 n(x,y)= min H(X + Y). (54) 

H(X)=x,H(Y)=y 



Thus f 2 n = /z 2 n > where fz 2 „ is defined in equation (|3j) . 
f 2 ™ is completely determined in the following theorem: 



Theorem 4.1. 



f 2 (x - /clog 2, y - /clog 2) + Hog 2, if k log 2 < x, y < (A; + 1) log 2 
max(x, y) otherwise. 



The following corollary is an immediate consequence of Theorem 4.1 and Mrs. Gerber's Lemma: 

Corollary 4.1. f 2 n(x,y) is convex in x for a fixed y, and by symmetry is convex in y for a fixed 
x. 



Proof of Theorem 4-1 We deal with the second case first. Assume 

fcilog2 < x < (fa + l)log2 , 

/c 2 log2<y< (A; 2 + l)log2 , 

where k\ ^ k 2 . Without loss of generality, assume k\ > k 2 - Note that we have the trivial lower 
bound 

f2^(x,y)>x (55) 

obtained from H(X + Y) > H(X). Thus if we can find distributions for X and Y such that this 
lower bound is achieved, then this would imply that f 2 n (x,y) = x. This is exactly what we do. 
Since y < fcilog2, let the distribution of Y be any distribution supported on the subgroup Z 2 k 1 
which is contained in 7* 2 n such that H(py) = y- Here as usual the subgroup Z 2 fc in Z 2 n is the 
set {0,2 n - fc ,2.2™- fc ,3.2 n - fc ,...,(2 A: - l)2 n ~ fc }. Also, as fcilog2 < x < (ki + l)log2, we can find 
a distribution of X which is supported on the subgroup Z 2 k 1 +i and is constant over the cosets 
Z 2 fc 1 +i/Z 2 fc 1 . The distribution of X + Y is given by the cyclic convolution px ©2™ Py which in this 
case is px again. Thus H(X + Y) = H(X), and f 2 ™(x, y) = x. 



Before considering the remaining case, we derive some preliminary inequalities. We'll use in- 
duction, assume that the theorem and the corollary is true for 2 n_1 and prove it for 2 n . We'll think 
of distributions on 7L 2 n as a combination of distributions supported on the cosets of Z 2 »-i in 7L 2 n. 
For a random variable X, we can write 

Px = ctp E + (1 - a)p , 

where 1 > a > 0, with pe supported only on the subgroup Z 2 «-i of 7j 2 n and po supported on the 
remaining half of Z 2 n . Similary we write 

PY = PqE + (1 - P)qo , 

where 1 > /3 > 0. 

Let X + Y = Z. The distribution of Z is given by 

Pz = Px ©2" PY (56) 
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ap E + (1 - a)po ©2" Pqe + (1 - P)qo 



a(3pE ©2" qE + O-- ot)(l - fi)po ©2" qo 



a(l - j3)p E ©2™ qo + (1 - a)/3po ©2" 9b 



(57) 
(58) 



Thus 

H(p z ) = h(a*P) 

+ (1 -a*0)H 



(59) 



a/3 (l-a)(l-/3) 

® 2 n-i H — po ®2«-i 9o 

1 — a * p 1 — a * p 



rTl 'a(l-/3) (l-a)/3 
+ (a * p)Y¥ ®2»-i 90 H ©2«-i <?£ 



a * /3 o. ~k f3 

> h(a* 0) + af5H^p E ® 2 ™-i <7£^ + (1 - a)(l - j3)H^p ® 2 n-i <?o^ 

+ a(l - P)h(t>e ©2™- 1 + (1 - oc)/3h(po ® 2 »-i 9b 

> / 2 U(a), /i(/3)^) + a/3/ 2 „-x ( H(p E ),H(q E )) +(!-«)(!- ^)/ 2 „_ 1 ( H(p ),H(q 



(60) 



+ a(l - /3)/ 2 «-i H(p E ),H(q ) + (1 - a)/3/ 2 »-i H(p ),H(q E 



> f 2 h(a), h(f3) + a/ 2 „-i H(p E ) 1 pH(q E ) + (1 - ( 9o ; 



+ (1 - a)/ 2 „-i H(p ),f3H(q E ) + (1 - /3)ff(<?o) 



> / 2 /i(a), /i(/3) + / 2 n-i afftps) + (1 - a)H( Po ),/3H(q E ) + (1 - /3)ff( go : 



/a h(a), + / 2 n-i (X) - /»(<*), #00 - /i(/3) . 



(61) 
(62) 

(63) 
(64) 



In this sequence of inequalities, (59) is a simple expansion of entropy, (60) is got via concavity of 



entropy, ( |61[ ) is using the definition of /, (62) and (63) are obtained using Mrs. Gerber's Lemma 
for 2 n_1 (by induction hypothesis), and the last equality follows from the chain rule of entropy. 



Coming back to the remaining cases of Theorem 4.1 we can write down the following inequalities 
as consequences of the preceding sequence of inequalities: 
For < x, y < log 2, 

f2 n (x,y) > mm f 2 {u,v) + / 2 ~-i(x - u,y - v) , (65) 



where < u < x and < v < y. 
For (n — 1) log 2 < x, y < nlog 2, 

fo(x,y) > mm f 2 (u,v) + f 2 n-i(x -u,y - v) , 
where x — (n — 1) log 2 < u < log 2 and y — (n — 1) log 2 < v < log 2. 



(66) 
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For [k — 1) log 2 < x,y < k log 2, k ^ 1, n, 

f 2 n(x,y) > mm f 2 (u,v) + f 2 n-i(x-u,y-v) , (67) 

u,v 

where < u < log 2 and < w < log 2. 

We'll consider the above three cases separately and prove the theorem in each of those three 
cases. 

Claim 4.1. For < x, y < log 2 we have 

h n {x,y) = h{x,y) . 



Proof of Claim 4-1 From equation (65) we have 

f2"(x,y) > rain f 2 (u,v) + f 2 n-i(x -u,y-v) , 
u,v 

where the maximum is over < u, v < log 2. However, by our induction hypothesis 

f 2 (u,v) + f 2 n-i(x -u,y -v) = f 2 (u,v) + f 2 (x -u,y-v) , 

and from the proof of the Z4 case, the value of this minimum is f 2 (x, y). Since this value is clearly 
achieved, we have f 2 ™(x,y) = f 2 (x,y). □ 

Claim 4.2. For {k — 1) log 2 < x, y < k log 2, k ^ 1, n, we have 

f 2 n (x, y) = (k - 1) log 2 + f 2 {x - (k - 1) log 2, y - (k - 1) log 2) . 



Proof of Claim 4-% From (66) we have 

h"{x,y) > mm f 2 (u,v) + f 2 u-i(x -u,y-v) . 

u,v 

We first note that if the minimum of the above expression occurs at (u*, v*) then we must have 

(k- l)log2 < x-u*,y-v* < /clog 2 , (68) 

or 

(k - 2) log 2 < x - u*, y - v* < (k - 1) log 2 . (69) 
To see this, suppose that w.l.o.g. we have 

(k - 2) log 2 < x - u* < (k - 1) log 2 , 

(k - 1) log 2 < y - v* < k log 2 . 
Let u be such that x — u = (k — 1) log 2. We have u < u* . By induction hypothesis, 

/ 2 «-i (x -u*,y- v*) = f 2 n-i (x - u, y - v*) = y - v* . 

But since u < u* we also have 

f 2 (u,v*)<f 2 (u*,v*). 

This leads us to conclude that 

f2(u, v*) + f 2 „-i (x-u,y- v*) < f 2 (u*, v*) + f 2n -i (x -u*,y- v*) , 
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which contradicts (u*,v*) being the minimizer. Now suppose we minimize over all pairs u,v such 



that (68) holds. By induction hypothesis, 



min f2(u, v) + f 2 n-i (x — u,y — v) = min / 2 (ii, v) + f 2 1 x — u — (k — 1) log 2, y — v — (k — 1) log 2 

u,v u,v 

+ (A;-l)log2 

= (k — 1) log 2 + min f 2 (u, u) + /2 ( x — (k — 1) log 2 — u, y — (k — 1) log 2 



From the proof of the Z4 case, we have that the minimum of the above expression is when u, v = 
which gives us 

min/ 2 («,t;) + / 2 n-i(x-«,y-t;) = {k - 1) log2 + f 2 ( x - (k - 1) log 2, y - (k - 1) log2 ) , (70) 



where it is implicit that the minimization is taken over all pairs u,v such that (68) holds. 



Now we minimize over all pairs u, v such that (69) holds. By induction hypothesis, 



min f2(u, v) + f 2 n-i (x — u,y — v) = min f 2 (u, v) + f 2 \ x — u — (k — 2) log 2, y — v — (k — 2) log 2 
u.v u.v 

+ (/c-2)log2 

= (k-2) log 2 + min f 2 (u, v) + / 2 ( x - {k - 2) log2 - u, y - (k - 2) log 2 



Again, by the proof of the Z4 case we have that the minimum value of the above expression is 
attained when u, v = log 2. Substituting we get 



min f 2 (u, v) + /jjn-i (x — u,y — v) = (k — 2) log 2 + f 2 (log 2, log 2) + / 2 he - (fc - 1) log 2, y - (k - 1) log 2 

«,« v 

= (A:-l)log2 + / 2 fx-(fc-l)log2, 2 /-(fc-l)log2^ . (71) 



Comparing ( 70 ) and ( 71 ) we arrive at 

fz n (x,y) > min/ 2 (u,u) + f 2 n-i(x -u,y-v) 

u,v 

= {k-l) log 2 + fc(x-{k-l) log 2, y - (fe - 1) log 2 

= f2"-i(x,y) ■ 

Since f 2 n-i(x,y) is achieved by supporting X and Y on Z 2 n-i we have / 2 n(x, y) = {k — l)log2 + 
/ 2 — (fc — 1) log 2, y — (fc — 1) log 2^ , proving the claim. □ 

Claim 4.3. For (n — 1) log 2 < x, y < n log 2, 

/2«(x,y) = (n - l)log2 + f 2 [ x - (n - l)log2,y - (n - l)log2 
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Proof of Claim \4-3\ We have 

h n {x,y) > mm. f 2 {u,v) + / 2 ™-i(x -u,y-v) , 

u,v 

where x — (n — 1) log 2 < u < log 2 and y — (n — 1) log 2 < v < log 2. Using our induction hypothesis, 

f 2 (u,v) + f 2 n-i(x -u,y-v) = min f 2 (u,v) + / 2 1 a; - u - (n - 2) log 2,y - v - (n - 2) log2 ) 

+ (n - 2) log 2 

= (n-2)log2 + / 2 (log2,log2) + / 2 ^-(n-l)log2,y-(n-l)lo ? 



min 

u,v 



= (n- l)log2 + / 2 l x- (n- l)log2 5 y- (n- l)log2 

where the second equality follows from the proof on Z4, where we had that the minimum of such an 
expression is attained when u,v = log 2. To show that equality is attained, consider px such that 
it takes a constant value on the subgroup of size 2 n_1 of Z 2 « and a constant value on 
the remaining half of Z 2 " such that H(px) = x. Similarly choose (3 such that py takes a constant 
value 7^=1 011 the subgroup of size 2 n_1 of Z 2 n and a constant value t^tt on the remaining half of 
Z 2 n, such that H(py) = y- We have 

x = H(p x ) = in- l)log2 + h(a), 

y = H(p Y ) = {n-l)\og2 + h{p). 

It is easy to verify that 

H{X + Y) = h(a*l3) + (n - l)log2 = f 2 (x - (n- l)log2,y - (n - l)log2) + (n - l)log2. 
This completes the proof of the claim. □ 



The above claims complete the proof of Theorem 4.1 □ 



Proof of Corollary uT7l Consider k log 2 < x < (k + 1) log 2 and the function f x (y) = f 2 n (x, y) . We 
have 

x, if < y < A; log 2 , 

fx(y) = I f 2 (x - klog2,y - klog2) + klog2 if A;log2 < y < (A; + l)log2 , 
k y if (fc + l)log2<y . 

This is immediately seen to be convex using MGL and Claim [37T] □ 



5 An EPI and MGL for abelian groups of order 2 n 

We first prove a lemma. 

Lemma 5.1. Consider two abelian groups G and H with the corresponding fa and fn functions, 
such that fc satisfies the generalized MGL. Then the following lower bound holds for fc®H : 

fG®H{x,y) > mm f H (u,v) + f G {x - u,y - v) , (72) 

u,v 

where n, v vary over 

max(0, x — log |G|) < u < min(log \H\, x) , (73) 
max(0, y — log |G|) < v < min(log \H\, y) . (74) 
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Proof of Lemma \5.1\ We can write any probability distribution on G © H in terms of a convex 
combination of probability distributions supported on the cosets of G. Note that there will be \H\ 
such cosets. Suppose X and Y are random variables taking values in G@H. We can write px and 
Py as 

Px = ^ ah P h ' ( 75 ) 

heH 

where eachp^ is a distribution supported on the coset (G, 0) + (0, h). The distribution of Z = X + Y 
can be broken down in a similar fashion as in (|75|), (|76|). 



Here we have 



Pz = ^2 IhTh ■ (77) 

heH 



lh = 1 ^2 ai Ph-i = (« ®H P)h , (78) 
_ Y,ieH( a A-i){Pi ®G Qh-i) , , 



Thus, using chain rule of entropy, we can write H(Z) as 

H(Z) = H( 7 ) + £ lhH{r h ) (80) 

heH 

it/ a\ , \^ a\ {T,ieH( a iPh-i)(Pi ®G Qh-i) 

^ v {a®HP)h 



heH 



(81) 



> if (a ©h /3) + Y.( a A-i)H(pi ® G Qh-i) (82) 

h i 

> f H (H(a),H((3)) + Y,zZ ( - a A-i)fG(H( Pl ),H(q h _ l )) (83) 

h i 

= f H (H(a),H(P)) + Y, Z ~2( a A-i)fG(H(p t ),H(q h _ l )) (84) 

i h 

= f H (H{a),H(P))+Y / <Xi \^Ph-ifa(H(pi),H(q h ^))\ (85) 

> f H (H(a),H(p)) + ^2 ai f G ^H(p l ),Y / p h ^H(q h ^) ) j (86) 
= f H (H(a),H(l3)) + J2<XifG (H( Pi ),^2l3 h H(q h )\ (87) 
>f H (H(a),H(P)) + f G (Y J a hH(.Ph), Z ~2f 3 hH(q h )) (88) 
= f H (H(a),Htf)) + f G (H(X)-H{a),H(Y)-H(P) > ) . (89) 
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Here (82) follows from concavity of entropy, (83) follows from the definition of /, (86) and (88) 
follow from fo satisfying the generalized MGL. Using the above, we can get the lower bound 

fG(BH{x,y) > mm f H (u,v) + f G (x -u,y-v) , (90) 

u,v 

where u, v vary over 

max(0, x — log |G|) < u < min(log \H\, x) , (91) 
max(0, y — log \ G\) < v < min(log \ H\,y) . (92) 

□ 

Theorem 5.1. If G is an abelian group of order 2 n , then fc(x,y) = f 2 «(x,y). 
Proof of Theorem \5.1\ Assume 

feilog2 < x < (h + l)log2, 

k 2 \og2<y< (A; 2 + l)log2, 

where k\ ^ k 2 . Without loss of generality, assume k\ > k 2 - Note that we have the trivial lower 
bound 

fa{x,y)>x (93) 

obtained from H(X + Y) > H(X). Thus if we can find distributions for X and Y such that this 
lower bound is achieved, then it implies fc(x,y) = x. This is exactly what we do. Let G\ be a 
subgroup of G of size 2 fcl+1 , and let G 2 be a subgroup of G\ of size 2 kl . Consider the cosets of G 2 
with respect to Gi, call them C${= G 2 ) and C\. Now consider the distribution of X as taking a 
constant value on Co and on C\ such that H(X) = x. Let the distribution of Y be any arbitrary 
distribution on Co such that H(Y) = y. Notice that (in terms of coset addition) 



Ci 



Co 


+ C 


— Co , 


C, 


+ Ci 


= C 1 + Q 


Ci 


+ Ci 


= c . 



Since Y is supported only on Co, and X is uniform on Co and C\ it is easy to see that X + Y is 
also uniform on Co and C\ and in fact has the same distribution as that of X. This takes care of 
all cases when k\ ^ k 2 and we can only concern ourselves with the case k\ = k 2 =: k. 

Now either G is a a cyclic group of size 2™, or G can be written as a direct sum H\ © H 2 where 
H\ and H 2 are themselves abelian of size 2 li ,i = 1, 2 respectively. In the first case, there is nothing 
to prove. So assume the second case holds, and without loss of generality let l\ <l 2 . Our proof will 
proceed in two steps, in the first step we show that /g(x, y) < f 2 ™(x, y) and in the second we show 
that fc(x,y) > f 2 n(x,y). We'll use induction in the second step, where we assume the theorem 
holds true for the smaller groups Hi and H 2 and prove it for G. 

Claim 5.1. fa{x,y) < f 2 ™{x,y) 
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Proof of Claim 5.1 As before, let 

Hog 2 < x < {k + 1) log 2, 

A; log 2 < y < (A; + 1) log 2. 

Consider a subgroup G\ of size 2 fc+1 , and a subgroup G2 of Gi of size 2 k . Let Co and C\ be the 
cosets of G2 in Gi . We consider a distribution of X which takes a constant value on |£ on Go and 
a constant value on C\ and has H(X) = x. Similarly consider a distribution of Y which takes 
constant values |§ on Go and |£ on C\ and has H(Y) = y. We have 

H(X) = x = /i(x ) + /clog2, 

H(Y)=y = h(y )+klog2. 

It is easy to verify that 

H (X + Y) = h(x * y ) + fc log 2 = / 2 (x - fc log 2, y - fc log 2) + fc log 2 = / 2 « (x, y). 
By the definition of fa, we get 



fc(x,y) < f 2 n(x,y). 



□ 



Claim 5.2. fc(x,y) > f 2 ™(x,y) 



Proof. By our assumptions, G = f/i © where \H\\ = 2 l1 , \H2\ = 2 l2 where l\ + l 2 = n and 
without loss of generality < l± <l 2 - We also assume that the theorem holds for H\ and H2 and 
prove it by induction for G. By Lemma 5.1 we have the lower bound 

fH 1 ®H 2 { x ^y) > min/^ 2 (n,t;) + f Hl (x -u,y-v), (94) 

u,v 

where u, v vary over 

max(0, x — log \ H\\) < u < min(log \H2\, x), (95) 
max(0, y — log \ Hi\) < v < min(log \ H2\,y). (96) 



Note that (95) and (96) are equivalent, respectively to 

max(0, x — log |i?2|) < (x — u) < min(log \Hi\, x), (24a) 
max(0, y — log l^l) < (y — v ) ^ min(log y). (25a) 

To facilitate the discussion, we term as a 'diagonal box' any square of the form 

[tlog2,(t + l)log2] x [tlog2(t + l)log2], 

for some integer < t < n — 1. 



First note that that if (u* , f * ) achieves the minimum in ( 94 ) , then it must be that (u* ,v*) is 
inside a diagonal box, and so is (x — u* , y — v*). To see this consider for instance the case when 
(u*,v*) lies 'below' a diagonal box. In this case we can increase v* (till we hit the diagonal box) 
while keeping the value of fH 2 (u*,v*) constant (= u*) and simultaneously decrease the value of 
fHi(x — u*,y — v*), thus decreasing the value of the sum. To be precise, suppose that 

fclog2 < x,y < (k + l)log2, 
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A' 



m log 2 < u* < min(x, (m + 1) log 2), 
where m < k. Suppose also that 

V* < to log 2. 

Then we have 

fn 2 (u* ,v*) = f H2 (u* , to log 2) = u* 
and by monotonicity of we also have 

f Hl (x -u*,y- v*) > f Hl (x - u*,y -m\og2). 

Note also that we have 

to log 2 < k log 2 < y 

and also that 

to log 2 < u* < log I H 2 1 . 



Thus to log 2 satisfies (96) and is a valid choice for t>. This shows that the optimal (u*,v*) can be 
taken to lie in the diagonal box [to log 2, (to + 1) log 2] x [to log 2, (to + 1) log 2] . Similar logic holds 
for when {u* ,v*) lies 'to the left' of a diagonal box, or when {x — u*,y — v*) lies 'above' or 'to the 
left' of a diagonal box. 

Our strategy will be as follows, we first use the above criteria on the optimal (u* , v*) to restrict 
the domain of (u, v ) to a number of sub-rectangles of the diagonal boxes. We then use the induction 
hypothesis and reduce the problem of minimizing fn 2 (u, v) + /h 1 (x — u,y — v) to that of minimizing 
/ 2 i 2 (u, v) + f 2 h (x — u, y — v ). We examine the value of min f 2 i 2 (u, v) + f 2 h (x — u,y — v) over the 
rectangles, one rectangle at a time. The minimum over a single rectangle can be determined from 
the proof of the Z 2 n case, and it turns out to be fi n (x, y) independent of which rectangle we choose. 
Thus the overall minimum also turns out to be f2 n (x,y) . 

Let k log 2 < x, y < (k + 1) log 2. 
Let us write 

x = k log 2 + x 1 , 
y = klog2 + y', 
where < x',y' < log 2 and define the rectangles 

Ro = [0,x'} x [0,y'] , 

#1 4 [log 2, log 2 + x'\ x [log 2, log 2 + y'}, 



R k = [k log 2, k log 2 + a/] x [A; log 2, log 2 + y'] 



and 

51 ^ [a/, log 2] x [j/,log2] , 

5 2 = [log 2 + x', 2 log 2] x [log 2 + y', 2 log 2] , 
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5 AN EPI AND MGL FOR ABELIAN GROUPS OF ORDER 2 
S k = [[(k - 1) log 2 + x',k log 2] x [(k - 1) log 2 + y',k log 2] . 
We consider three separate cases 

k + 1 < h, 
h + l<k + l<l 2 , 
l 2 + l<k + l<n. 



N 



In the first case, the set of (u, v) that satisfy (95), (96) and such that (u, v) and (x—u, y—v) both lie in 
diagonal boxes is (u^ =0J R m )u(u^ =1 5 m ). In the second case it is (u^ =fc _ il+1 i2 m )u(u^ =fc _ h+1 S' m ), 

and in the third case it is ( U l ^l]_ h+1 R m J U (u^ =fc _; i+1 5 m j . 



Fix < m < k and consider 



min (fH 2 (u,v) + f Hl (x - u,y - v)) 

(u,v)£R m 



assuming that we are in one of the three cases where all (u, v) G R m satisfy equations (95), (96) 
Let us write 

u = m log 2 + u , 
v = m log 2 + v , 

where < u' < x' and < v' < y'. By induction hypothesis we have 

fH 2 {u,v) = f 2 i 2 (u,v) = mlog2 + f2(u,v') 

and 

f Hl ( x ~u,y-v) = f 2h (x - u,y - v) = (k - m)log2 + f 2 {x - u',y' - v') . 

Hence 

min (f H2 (u,v) + f Hl (x - u,y - v)) = k log 2 + min f 2 (u', v') + f 2 {x - u\y - v') 

(u,v)£R m u',v' 

^klog2 + f 2 (x',y') . 

Here (a) follows from the proof of the Z4 case. Note that this equals f 2 n(x,y). 
Now fix 1 < m < k and consider 

min (f H2 {u, v) + f Hl (x-u,y-v)) 
(u,ti)e5 m 



assuming that we are in one of the three cases where all (u,v) £ S m satisfy equations (95), (96) 



Note that this is equivalent to requiring that we are in one of the cases where all (x — u,y — v) for 



(u, v) satisfy (24a), (25a). Let us write 

u = (m — 1) log 2 + u, 

v = (m — 1) log 2 + v\ 
where x' < u' < log 2 and y' < v 1 < log 2. By inductive hypothesis we have 

fH 2 (u,v) = f 2 i 2 (u,v) = (m - l)log2 + f 2 (u',v') . 
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6 EXTENSIONS 



Further, since 

x — u = (k — m) log 2 + log 2 + x — u' , 
y — v = (k — m) log 2 + log 2 + y — v , 
with x' < log 2 + x' — v! < log 2 and y' < log 2 + y' — v' < log 2, by inductive hypothesis we have 

f Hl (x -u,y-v) = f 2h (x - u, y - v) = (k - m)log2 + / 2 (log2 + x' - u',log2 + y' - v'). 

Hence we have 

min (f H2 (u,v) + f Hl (x - u,y - v)) 

{u,V)£Sm 

= (k — l)log2+ min /2(w', v') + /2(log 2 + x — v! , log 2 + y — v') 

x'<u'<\og 2,0<?/<log 2 

( = } (fc - 1) log 2 + / 4 (log 2 + x', log 2 + y') 
^k\og2 + f 2 (x\y') , 

where (a), (b) again follows from the proof of the Z4 case. Note that that this equals f2 n (x, y). This 



completes the proof of Claim 5.2 and thus of Theorem 5.1 □ 

□ 

6 Extensions 

In this section we will prove some extensions of the earlier results that seem to be of potential 
interest. 

6.1 Scalar and Vector MGL 

Claim 6.1. Let X, Y and Z be random variables taking values in an abelian group G of order 2 n , 
and let U be an arbitrary random variable. Suppose Z is independent of (U, X) and Y = X + Z 
where the addition is understood to be the group addition. Then 

H(Y\U)>f G (H(X\U),H(Z)) 

Remark 4. In the case of binary random variables X, Y, and Z, where Z ~ Bern(p), U is an 
arbitrary random variable, and Z is independent of (U, X), one has the scalar MGL given by 

H(Y\U) > h(h- l (H{X\U))*p) . 



Thus, Claim 6.1 can be thought of as the generalization of this scalar MGL for random variables 



taking values in an abelian group of order 2 r 



Proof of Claim 6. 1 



H{Y\U) = ^P(U = u)H{Y\U = u) (97) 

u 

= ^P(U = u)H(X + Z\U = u) (98) 

u 

>Y,P(U = u)f G (H(X\U = u),H(Z)) (99) 
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6. 1 Scalar and Vector MGL 



6 EXTENSIONS 



f G (H(X\U),H(Z)) 



(100) 
(101) 



where (99) follows from the definition of fo and ( |100 ) follows from the convexity of fc(x, y) in x 
for fixed y. □ 

Claim 6.2. Let X k be a random vector each of whose coordinates takes values in an abelian 
group G of order 2 n , and let U be an arbitrary random variable. If Z k is a vector of independent 
and identically distributed G -valued random variables, each distributed according to pz, and Z k is 
independent of (X k , U), with Y k = X k + Z k then 



H(Y k \U) 



k 



>fa 



H(X k \U) 
k ' 



H(Z) 



Remark 5. Claim 6.2 in the case of binary random variables where Z k is a vector of i.i.d. random 
variables having distribution Bern(p), is given by 



H(Y k \U) >h f_ 1 (H{X k \U) 



k 



V k 



•kp 



and is known to be true. Thus, Claim 6.2 can be thought of as the vector MGL for random variables 
taking values in an abelian group of order 2 n . 



Proof of Claim 6.2 



H{Y k \U) 
k 



E 



k 
i=l 



> 



> 



i=l 
k 

V 

;=] 

k 

E 

i=i 

k 

E 

i=l 



H(Yj\U, Y^ 1 ) 
k 

H{Y i \U,Y i - 1 ,X i - 1 ) 



HiYilU^'- 1 ) 



f G {H{Xj\U,X*- L ),H{Z)) 
k 



H{X i \U,X i ~ 1 ) 



-,H(Z) 



vi=l 



(102) 

(103) 

(104) 

(105) 

(106) 
(107) 



Here, (103) is because conditioning reduces entropy, (104) is because the channel from X to Y is 
a DMC, (105) follows from the scalar MGL, (106) is because of the convexity of fa(x,y) in x for 
fixed y. □ 
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6.2 The minimum entropy of a sum of k > 2 independent G-valued random variables with fixed 
entropies 6 EXTENSIONS 

6.2 The minimum entropy of a sum of k > 2 independent G-valued random variables with fixed 
entropies 

Consider an abelian group G of order 2 n and k > 2 independent random variables X\,X 2 , ...,Xk 
taking values in G. We define the function 

/G,fc(xi,x 2 ,...,x fc ) := min H(X 1 + X 2 + ... + X k ) . (108) 

H(Xi)=Xi,l<i<k 

The function f G \ is the identity function, whereas our earlier function f G can be thought of as 

/g,2- 

We divide the interval [0,ralog2] into n blocks of size log 2, namely [ilog2, (i + l)log2], where 
< i < n — 1. We bin xi,x 2 , into these n bins and consider the largest m such that 

mlog2 < xi < {m + 1) log 2 for some 1 < I < k. Let the contents of this bin be x l ,x 2 , ...,x r where 
r < k. Call the corresponding random variables X 1 ,X 2 : ...,X r . We claim the following: 

Claim 6.3. 

. min HiX 1 + X 2 + ... + X r ) = f G>2 (x\ / G , 2 (x 2 , {...(f G , 2 (x r -\x r )))..)) . 

H(X l )=Xi,l<i<r 



Proof of Claim\K3[ Note that 

H(X r + X r - 1 )>f G , 2 (x r ,x r - 1 ), 
by definition of f G ,2- Now by monotonicity of f G , 2 , we also have 

H(X r - 2 + I- 1 + X r ) > fG^x^^HiX^+X 1 -)) > f G , 2 (x r - 2 ,f Gt2 (x r ~\x r )). 
Continuing in a similar fashion, we get 

H^X 1 + X 2 + ... + X r ) > f G , 2 {x\f G , 2 {x 2 , {...{f G ^ 2 {x r -\x r )))..)) 
for whatever choice of distributions of X 1 ^ 2 , ...,X r . This gives us the lower bound 

min HiX 1 +X 2 + ... + X r ) > f G , 2 (x\ f G . 2 (x 2 , (...(/^(x^ 1 , ^)))..)). (109) 

H(X t )=x\l<i<r 

Now consider a group Hi of order 2 m+1 and its subgroup H 2 of order 2 m . Let X 1 , X 2 , ...X r 
have distributions supported on H\ such that they take constant values on the cosets H\/ H 2 and 
satisfy H(X l ) = x l for 1 < i < r. Let these distributions be Px 1 ^Px 2 ^ ■■■>Px r - For this choice of 
distributions, we have 

H(X r + X r - 1 ) = f G , 2 (x r ,x r - 1 ), 
since these distributions achieve equality for f Gt2 . We also have 

H(X r - 2 + X'- 1 + X r ) = f G , 2 (x r ~ 2 , f G , 2 (x r -\x r )), 

as Pxr-z and Px^~ i ®g Px r are equality achieving distributions for /g,2- Continuing similarly, we 
see that the lower bound is achieved, thus proving the claim. □ 

Claim 6.4. 

fG,k( x l,-, x k) = /g,2(x 1 ,/g,2(^ 2 , (...(f G , 2 (x r ~ 1 ,X r )))..)) . 



24 



6.2 The minimum entropy of a sum of k > 2 independent G-valued random variables with fixed 
entropies 6 EXTENSIONS 



Proof of Claim 6.4 Note that since k > r, and by Claim [673] , we have the lower bound 

/ G) ^x 1 ,...,x fe )>/ G)r (x^x^...,x r )=/ GJ2 (x^/ G , 2 (x^(...(/ GI2 (^-^xO))..)). (no) 

We'll show that this lower bound is attained. Consider a group Hi of size 2 m+1 and its subgroup 
H 2 of size 2 m . Define distributions of X l ,X 2 , ...X r supported on Hi such that they take constant 
values on the cosets H1/H2 and satisfy H(X l ) = x 1 for 1 < i < r. Let the remaining random 
variables take arbitrary distributions supported on either of the two cosets of H 2 in Hi, and such 
that they satisfy the entropy constraints. It is easily checked that 

VX X ®GPX 2 ®G ■■■ ®GPX k = Px 1 ®GPx' 2 ®G ■■■ ®G PX r , (111) 

giving us 

H(Xi +X 2 + ... + X k ) = H(Xi +X 2 + ... + X r ) = f G , 2 ( Xl , f G , 2 (x 2 , (...(/ G , 2 (x r _i, x r )))..)) 
where the second equality follows from Claim 6.3. By the definition of / G this gives us 



/ G ,fe(xi, ...,x k ) < / G , 2 (xi,/ G2 (x 2 , (...(/ G , 2 (x r _i,x r )))..)) . 



Equations (110) and (112) prove Claim 6.4 



112) 
□ 



Theorem 6.1. Given any xi, x 2 , x k we have 

fG,k{x\,x 2 , x k ) = / G , 2 (xi, / G)2 (x 2 , (...(f Gj 2(x k -i,x k )))..)) 



Proof of Theorem 6.1 Let r be as before and let 2%, Xi 2 , Xi r be those zj's which land in the 
largest bin, [mlog 2, (m + 1) log 2]. Let 1 < ii < i 2 < ... < i r < k. It is easy to see that 

fG,2( x ir-iJG,2(Xi r -i+l,fG,2(-i x k))--)) = /^((xir-u^))- 
Continuing in a similar manner, we get 

/ G 2(X1,/ G , 2 (X 2 , (...(/ G , 2 (^-l,Xfc)))..)) = fG,2(.Xi 1 ,f Gt2 (Xi 2 ,(...(fG,2{x ir _ 1 ,Xi r )))..)) 



which by Claim 6.4 is f Gk (xi, x 2 , xj.) thus proving Theorem 



6.1 



□ 



Corollary 6.1. f Gtk (xi,x 2 , ■~,x k ) is convex in each variable, when the remaining are kept fixed. 



Proof of Corollary \6.1\ Without loss of generality, consider x k as varying and the remaining vari- 
ables fixed. As before, let the largest bin in which atleast one X{ is present be [mlog 2, (m+1) log 2]. 
Now as long as x k < mlog 2, 

f G ,k(xi, -,x k ) = / G ,fc_i(xi,x 2 , ..,ajjfc_i) 
which is constant as a function of x k . For mlog 2 < x k < (m + 1) log 2, we have that 

f G ,k{xi,X 2 , ...,X k ) = f G ,2{x k , f G ,k-l(xi,X2, X fc _l)) 

which is convex in by MGL. For x k > (m + 1) log 2, 

f G ,k(xi,x 2 , -,x k ) = x k . 

Now the convexity easily follows from MGL and Claim |3.1| □ 
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A Proof of Lemma 2.1 



We'll first compute Let x = h(p) and y = h(q) with < p,q < \, so f(x, y) = h(p* q). 



df df dp 
dx dp dx 



<l-2,)log(I^)x- 

p*q J bg 



;ii3) 
;ii4) 



Notice that as x moves along a line with slope 8 > 0, p and q both strictly increase and consequently 
the function (1 — 2q) strictly decreases. Therefore to show g| strictly decreases, it is enough to 

show that log ^"f* 9 ^ x - — (\- P \ rnonotonically decreases along the line. Now let us compute the 
directional derivative of g(x,y) = — S ■, 7 at a point (x,y) £ (0,1) x (0,1), as we move in a 



direction (1,0). 

dg_ +Q dg_ = dg_dp_ +g ^l^Q_ 
dx dy dp dx dq dy 



, p j 



log 



i-p 



i 



1— p*q p*q 



U(l-2<z)-log 



l = p*q 
p*q 



__l 1 

1-p p 



log 



1-p 



+ 



1— p*q p-kq J 



h) (1 " 2P) 



log 



1-9 



-(l-2g) 



log 



(p* q)(l - p * q) (log (i=£j J p(l - p) (log (^ 
(1 " 2p) 



+ 



i-p*g 
p*g 



(p*g)(l-p*g)log(^ £ )log (/ 



1-9 



(115) 
(116) 



(117) 



Now we choose 6 = tt^. We want to show that with this choice of 6, (117) < 0, since this would 



h( P )' 



mean g(x, y) decreases as we move in the desired direction. Thus we see that it suffices to show 



-(l-2g) 



log 



l-p*9 
p*q 



%)(l-2p) 



d>*<i)(\ -i>*< { ) (log (i=2)) 2 p(l-p)(log(i^)) 3 fc(p)(p*g)(l-p*g)log 



P / 6 V 9 



1-9 

(118) 



? 

< 
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Note that since (x,y) lies in the interior, < p, q < |. Multiplying throughout by log y^rj we 
need to show 



(1-2?) , M^rJ %)(i-2 P ) 



? 



, v + v ' — 2 — rr~r - • ( 119 ) 

(p*?)(l-p*?)log(lJ) p(i_p)(l og (!=2)J Mp)(P*?)(l-P*?)log(V) 
Taking the negative terms on the other side, we need to show 

lQg (^f) I (1-2?) | ^g)(l-2p) 

p(i-p)(iog(^)) 2 ~ (p*?)(i-p*?)iog(^) /»(p)(p*g)(i-p*g)log(i=a) 

Multiplying by (p*?)(l — p*?) on both sides, we need to show 

(p*?)(l-p*?)log(^f) 7 (l_2g) | %)(l-2p) 

fc(p)Iog(i=s) ' 

Now multiplying both sides by p(l — p) (\og {~~p*\\ > we nee d to show 

/i-p*o\ ? /i-zA Ki-p)(i-2p) (logf^)) %) 

fc*? 1-p*? log V —± < l-2g)p 1-p log ? + V \ P ^ • 

VP*?/ \ P J h(p)log(^j 

(122) 

We'll now analyse (122) by keeping the left side fixed and finding the minimum of the right side 

-2p 

Vip sprrmrl t.prm it; fit is pasv to «pp that -, 

, 1-g 



p(l-p)(log(V)) 2 MV) Mp)1o. 



Let p*? = /c. Note that p < k and q = Observe that when p = k, q = and the first term on 

the right side equals the left side, whereas the second term is (it is easy to see that —jy^t -> 



as ?— > 0). Thus, it will be sufficient to show that the right hand side is a decreasing function of p 
if p * g is fixed. Substitute g in the first term to get 



(1-2A;) , Nl (l-p\ p(l-p)(l-2p)(log(i^)) M«) , x 

Vl - p) log ^ + V , \ P 77 . (123) 



2 



(1" 2 P) \ P J h{p) log (i=« 



Showing (123) decreases in p for a fixed is equivalent to showing A(p, fe) decreases in p where ^4 
is given by 

1 fl-p\ p(l-p)(l-2p)(log(^)) 2 %) 

A(p, k) = 7r ^p(l - P) log ( — ? + /i \ ' ( 124 ) 

(!" 2 P) \ P J h(p)]og(^ L )(l-2k) 

For ease of notation, rename the following functions 
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p(l-p)(l-2p)(log(i=»" J 

N ® = m 

L{q) ■- k[q) 



log (if 

So we have (note that in the equation below q is thought of as a function of p and k) 

A{ P ,k) = M( P ) + W^. (125) 

Differentiating w.r.t p, we get 

dL(q) dq 



N'(p)L(q) N(p)'- 



B(p, k) = M'{p) + + y_ a 2 9 fc ap (126) 



where (127) is got by ^ = — (jz^p ■ We want to show that B(p, k) < for all valid pairs (p, A;) (a 
pair is valid if < p < k). It is therefore sufficient to show that maxfc>„ B(p, k) < 0. We now make 
two claims. 

Claim A.l. N'(p) < i.e N(p) is a decreasing function of p, as p goes from to \. 




Figure 1: Plot of N(p) 

Claim A. 2. L(x) = ^~tt=^ s ^ an increasing function as x goes from to \, and L'(x) is minimum 
at x = 0. 

Suppose we did term-by-term maximisation of B(p,k) as k varies. M'{p) does not depend on 



k, so we don't need to care about it. Now for the second term, since N'(p) < (by Claim A.l), to 



maximise B we need to minimise as a function of k. Now we note that as k ^, q t and by claim 
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Figure 2: Plot of L(p) 



2 we get L(q) \. Also clearly as k t, t- Thus increases in fc, and to minimise it, the best 
choice of k is the minimum possible A:, whi ch is p. For the third term, because of the minus sign, 
we need to minimise dL }^ ■ By Claim A. 2 we see that this happens when q = which happens 
when k equals p. Thus, the above discussion leads us to conclude that argmaxfc> p i?(p, k) = p. It 
therefore suffices to show that 



B(p,p) < for all < p < 



1 



(128) 



Having motivated the claims, we'll now prove them. 



Proof of Claim A.l, Let's recall N(p) 



N(p) 



p(l -p)(l - 2p) (log 



Hp) 



Since (1 — p), (1 — 2p) and log yjrj are decreasing functions of p, we conclude that it suffices to 
prove 



N(p) :-- 



Pl0g(^ 

Hp) 



decreases in p. Differentiating N and simplifying, we get that it suffices to show 



N(p) := Hp) 1°£ 



1-p 



Hp) 
(i-p) 



(129) 



Now as p —7- 0, N tends to 0. Thus, to show that it is negative we'll show that N' < 0. Differentiating 
again, and simplifying we get that it suffices to show 



Hp) 



p i-pj V p 



(130) 
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Now we expand h(p) = — plog(p) — (1 — p)log(l — p) and simplify (130) to get 



(plog(p) + (1 - p) log(l - p)) ( - + — — 

p 1 — p 



-Hog( i- ^ <0 



pi — p ' 
<^>log{p).+ log(l -p) + log(p) H log(l -p) + log(l -p) -log(p).< 



1-p 



P 



2 log(l - p) + — ^— log(p) + - — - log(l - p) < , 



1 — p 



1 — p 



which is immediate since < p, 1 — p < 1. This proves Claim A.l 



(131) 
(132) 
(133) 
□ 



Proof of Claim \AJ\ Recalling L(x), 



L(x) :-- 



h(x) 



Differentiating, 



h(x) 



L'(x) = 1 + X(1 7 X) ., > I > • 



(134) 



Oog(^)) 2 

Thus L{x) is clearly an increasing function. To show that L'(x) is minimum at x = 0, we'll show 
that L'(0) = 1. 



lim 

x->0 



h(x) 



x(l-x) (log(^)) 



lim 



lim 



log (^) 



"° (log(^)) -i^log(^) 

lim ^ — r 

*->o iog(i=s) -2 

. 



This proves claim A. 2 



□ 



Coming back to (128) 



B{p,p) = M'(p) + N\p) 



L(0) N(p) dL(q) 



L(0) = lim 



1 - 2p (1 - 2p) 2 dq 
H(q) n 



q=0 



By the proof on Claim A. 2 we also know that 



, dL(x) 

lim — ^ = 1 . 

a;->o ax 



Using this, we get 



B(p,p)=M'(p) 



N(p) 
(1 - 2p) 2 ' 



(135) 



(136) 
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We want to show that 

p(l-p)(l-2p)(log(^)) 2 



•1 | 1 f 1 — p^ | 2p(l — p) ^ (\—p 
^ -(1 - 2p) + (1 - 2p) 2 log ( ) + 2p(l - p) log ( ) < 



7 p(l-p) (log(^))' 



^ (T32P) + log hr J + (i^w log l^rj - %)d-2p) (139 2 } 

i-fA , ? f(i-p)(i-2p)(iog(^)) 2 



p y \ P J h{p) 

(140) 

/>(/>)( -(J - 2/;)-l- !2/r - 2/,+ l)loo (i^^ <p(l-p)(l-2p) (log^-^Vj . (141) 



Now we expand h(p) = — plog(p) — (1 — p) log(l — p) and log [~^rj = l°g(l — p) — l°g(p) an d evaluate 
both sides of this inequality while collecting the coefficients of (logp) 2 , (log(l— p)) 2 , log(p) log(l— p), 
log(p) and log(l — p). After cancellation, we get that we need to show 

p 2 (log(p)) 2 - (1 -p) 2 (log(l -p)) 2 + (1 - 2p) (log(p)log(l -p) +plog(p) + (1 -p)log(l -p)) < . 

(142) 

Define 

F(p) :=p 2 (log(p)) 2 -(l-p) 2 (log(l-p)) 2 +(l-2p) (log(p)log(l -p) +plog(p) + (1 - p) log(l - p)) . 

(143) 



P 



-0.005 




F(p) 



-0.010 



-0.015 



-0.020 



Figure 3: Plot of F(p) 



Claim A.3. F{p) < . 
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Proof of Claim A.5\ We'll compute the first few derivatives of F, and their values at p = and 
p = \. We'll use F^ n \p) to indicate the n-th derivative. 

F {1) (p) = j, 1 s (2(l-p) 2 plog(l-p) 2 -p 2 log(p)(l-2p-2(l-p)log(p)) (144) 
(1 -P)P 

+ (1 - p) log(l - p)(l - 3p + 2p 2 - 2plog(p))) . 

We observe that FW(0) = -1 and FW(0.5) = 0. (Note that FW(0) is computed in the limit.) 
Now consider the second derivative 

F {2) (p) = (1 _^ )V ((-1 + p 2 + 2p 3 - 2p 4 ) log(l - p) - 2(1 - pfp 2 log(l - p) 2 (145) 
+ p(-l + 3p - 2p 2 + p(5 - 6p + 2p 2 ) log(p) + 2(1 - pfp \og{p) 2 )) . 

Again, evaluating in the limit we see F^ 2 )(0) — > +oo and F^- 2 \0.5) = 0. Now we compute the third 
derivative 

F (3) (p) = (1 _p)3 p 3 2 (( 1 -^ 2{l + V)log(l ~P) (146) 
+ p(l+p- 4p 2 + 2p 3 + p(2 -Ap + bp 2 - 2p 3 ) log(p))) . 

We evaluate and check that in the limit F^(0) —> — oo and -F^ 3 ^(0.5) > 0. Now suppose for 
some < p < I, it were to be the case that F{p) > 0. Since F(Q) = 0, F(0.5) = and F^^O) = -1 
we see that F must have a zero in (0,0.5). Now applying Rolle's theorem [32] twice, we get that 
^(1) must have 2 zeros in (0,0.5). We also have F^^(0.5) = 0, which means we can use Rolle's 
theorem again to conclude that F^ 2 ) must have atleast 2 zeros in (0,0.5). Using F( 2 )(0.5) = 0, 
and using Rolle's theorem again, we get that F^ must have atleast 2 zeros in (0,0.5). Thus, if we 
can show that F^ has exactly 1 zero in (0,0.5), (note that it has atleast 1 zero since F( 3 )(0) and 
F( 3 )(0.5) have opposite signs) then it implies that F < 0. Our strategy is to prove F^ is concave, 
and based on the values it takes at and 0.5, it must have exactly 1 root in (0,0.5). 
To this end, we compute the fifth derivate of F 

F {5) (p) = _ 2 p)5p5 (P 1 (p)log(p) + P 2 (p)log(l-p)+P 3 (p)) (147) 

where 

Pi (p) = 2p 2 {2 - 10p + 20p 2 - lip 3 + 7p 4 - 2p 5 ) , (148) 

P 2 (p) = 2(1 - p) 2 (6 - 15p + 9p 2 + 3p 3 - 3p 4 + 2p 5 ) , (149) 

P 3 (p) = p(12 - 49p + 70p 2 - 25p 3 - 12p 4 + 4p 5 ) . (150) 

Claim A.4. F x (p) > 0, P 2 (p) > for < p < \. 



Assuming Claim A.4| is true, we use the following polynomial approximations for log(p) and 
log(l-p): 

(1 -v) 2 

log(p) < -(l-p)-^PL, (151) 

2 

log(l-p) < -p-y- (152) 
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o.io 



0.05 



F'(p) 



-0.05 



-0.10 



F"(p) ' 



F"'(p) 




Figure 4: Plot of the derivatives F(p) 



Thus, 

Pi(p) log(p) + P 2 (p) log(l -p) + F 3 (p) < Pi(p) f-(l - p) - (1 ~ ) + F2O) 



7, 



The right hand side of the above expression simplifies to to 



12(1 -p) 2 (p- 0.5) V(P -P + o) 



-/»- y J + ^(p) • 
(153) 



(154) 
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o.i 



02 



P 



0.5 



Figure 5: Plots of the P\ and P2 



which is immediately seen to be < 0. Thus ( |153| ) and (147) give us 

F (5) (p) < 



which means F^ is concave, thus proving Claim A. 3 



□ 



Proof of Claim A.J^, To show Pi > 0, we need to show that 



Pj(p) = 2 - 10p + 20p 2 - Up 3 + 7p A - 2p 5 > . 



(155) 



-Pi(O) = 2 and Pi(0.5) = 1. Thus if we show that Pi has no real roots in (0,0.5), we'll be done. 
We show this using Sturm's theorem [33]. Using Mathematica, we construct the Sturm sequence 
which is 

go = +2 - 10p + 20p 2 - Up 3 + 7p A - 2p 5 , 



91 



-10 + 40p - 33p 2 + 28p 3 - 10p 4 



92 = -(3/5 - (12p)/5 + (369p 2 )/50 - (12p 3 )/25) , 
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93 = -(-(2675/16) + (2625p)/4 - (61325p 2 )/32) , 

g 4 = -(4436544/150430225 - (16965504p)/150430225) , 

g 5 = -31638033631325/249850977408 . 

Evaluating the above sequence at 0, we get the signs (+, — , — , +, — , — ) which has 3 sign changes. 
Evaluating at 2> we get the signs (+,+,—,+,+.—). Since this also has 3 sign changes, we are 
assured that P\ has no real roots in [0, \]. 

Similarly, we consider P2(p), for which we need to show 

? 

P 2 (p) = 6 - 15p + 9p 2 + 3p 3 - 3p 4 + 2p 5 > . (156) 

P2 takes values 6 and 1 at and 0.5 respectively. Again, constructing the Sturm sequence for P2 
we get 

go = 6 - 15p + 9p 2 + 3p 3 - 3/ + 2p 5 , 

gi = -15 + 18p + 9p 2 - 12p 3 + 10p 4 , 

g 2 = -(51/10 - (273p)/25 + (297p 2 )/50 + (12p 3 )/25) , 

93 = -(45675/32 - (50825p)/16 + (61325p 2 )/32) , 

g 4 = -(-(2505792/30086045) + (16965504p)/150430225) , 

g 5 = (31638033631325/249850977408) . 

Evaluating at gives the sign sequence (+,—,—,—,+,+), and evaluating at 0.5 gives the sign 
sequence (+, — , — , — , +, +). Since they have the same number of sign changes, we c onclu de that 



P2 has no zeros in [0,0.5]. This proves Claim A. 4 and completes the proof of Lemma 



2.1 



□ 
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